Named Entity Chunking Techniques in Supervised Learning for Japanese Named Entity Recognition
نویسندگان
چکیده
This 1)aper focuses on the issue of named entity chunking in Japanese named entity recognition. We apply the SUl)ervised decision list lean> ing method to Japanese named entity recognition. We also investigate and in(:ori)orate several named-entity noun phrase chunking tech.niques and experimentally evaluate and con> t)are their l)erfornlanee, ill addition, we t)rot)ose a method for incorporating richer (:ontextua] ilflbrmation as well as I)atterns of constituent morphenms within a named entity, which h~ve not 1)een considered ill previous research, and show that the t)roi)osed method outt)erfi)rms these t)revious ai)proa('hes. 1 I n t r o d u c t i o n I t is widely a.greed that named entity recognition is an imt)ort;ant ste t) ti)r various al)pli(:ations of natural language 1)ro(:('.ssing such as intbnnation retrieval, maclfine translation, intbrmation extraction and natural language understanding. In tile English language, the task of named entity recognition is one of the tasks of the Message Understanding Conferonce (MUC) (e..g., MUC-7 (19!)8)) and has be.on studied intensively. In the .}al)anese language~ several recent conferences, such as MET (Multilingual Entity Task, MET-I (Maiorano, 1996) and MET-2 (MUC, 1998)) and IREX (Information l{etriew~l and Extraction Exercise) Workshop (IREX Committee, 1999), focused on named entity recognition ms one of their contest tasks, thus promoting research on Jat)anese named entity recognition. In Japanese named entity recognition, it is quite common to apply morphological analysis as a t)reprocessing step and to segment the sentence string into a sequence of mori)henles. Then, hand-crafted t)attern m~tching rules and/or statistical named entity recognizer are apt)lied to recognize named entities. It is ofl;en the case that named entities to be recognized have different segmentation boundaries from those of morpheums obtained by the morphological analysis. For example, in our analysis of the ]Ill,F,X workshop's training corpus of llallled entities, about half of the mtmed entities have segmentation boundaries that al'e differellt ]'rein the result of morphological analysis t)y a .]al)anese lnorphological analyzer BI~EAKFAST (Sassano et al., 1997) (section 2). Thus, in .Japanese named entity recognition: among the most difficult problems is how to recognize such named entities that have segmentation boundary mismatch against the morphemes ot)tained l)y morphological analysis. Furthermore, in almost 90% of (:ases of those segmentation t)oulldary mismatches, named entities to l)e recognized can t)e (teconq)osed into several mort)heroes as their constituents. This means that the 1)roblem of recognizing named entities in those cases can be solved by incorporating techniques of base noun phrase chunking (Ramshaw and Marcus, 1995). In this paper, we tbcus on the issue of named entity chunking in Japanese name.d entity recognition. First, we take a supervised learning approach rather than a hand-crafted rule based approach, because the tbnner is nlore promising than the latter with respect to the amomlt of human labor if requires, as well as its adaI)tab i l i ty to a n e w d o m a i n or a new def in i t ion o f named entities. In general, creating training data tbr supervised learning is somewhat easier than creating pat tern matching rules by hand. Next, we apply Yarowsky's method tbr supervised decision list learning I (Yarowsky, 1994) to 1VVe choose tile decision list learning method as the
منابع مشابه
Minimally Supervised Japanese Named Entity Recognition: Resources and Evaluation
Approaches to named entity recognition that rely on hand-crafted rules and/or supervised learning techniques have limitations in terms of their portability into new domains as well as in the robustness over time. For the purpose of overcoming those limitations, this paper evaluates named entity chunking and classi cation techniques in Japanese named entity recognition in the context of minimall...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملUnsupervised Part-Of-Speech Tagging Supporting Supervised Methods
This paper investigates the utility of an unsupervised partof-speech (PoS) system in a task oriented way. We use PoS labels as features for different supervised NLP tasks: Word Sense Disambiguation, Named Entity Recognition and Chunking. Further we explore, how much supervised tagging can gain from unsupervised tagging. A comparative evaluation between variants of systems using standard PoS, un...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کامل